Creating the DISEQuA Corpus: a Test Set for Multilingual Question Answering
نویسندگان
چکیده
This paper describes the procedure adopted by the three coordinators of the CLEF 2003 question answering track (ITC-irst, UNED and ILLC) to create the question set for the monolingual tasks. Despite the little resources available, the three groups collaborated and managed to formulate and verify a large pool of original questions posed in three different languages: Dutch, Italian and Spanish. A part of these queries was translated into English and shared between the three coordination groups. Thus, a second cross-verification was conducted, in order to extract the queries that had an answer in all the three monolingual document collections. Finally, the result of the joint efforts was the creation of the DISEQuA (Dutch Italian Spanish English Questions and Answers) corpus, a useful and reusable resource that is freely available for the research community. The article reports on the different stages of the corpus creation, from the monolingual kernels to the multilingual extension.
منابع مشابه
ارایه یک پیکره پرسش و پاسخ مذهبی در زبان فارسی
Question answering system is a field in natural language processing and information retrieval noticed by researchers in these decades. Due to a growing interest in this field of research, the need to have appropriate data sources is perceived. Most researches about developing question answering corpus area have been done in English so far, but in other languages as Persian, the lack of these co...
متن کاملThe First Cross-Script Code-Mixed Question Answering Corpus
In this paper, we formally introduce the problem of crossscript code-mixed question answering (QA) and we elaborate the corpus acquisition process and an evaluation strategy related to the said problem. Today social media platforms are flooded by millions of posts everyday on various topics. This paper emphasizes the use of such ever growing user generated content to serve as information collec...
متن کاملMultilingual Pattern Libraries for Question Answering: a Case Study for Definition Questions
In this paper we investigate the effectiveness of a novel resource for Multilingual Question Answering (QA). Such a resource consists of a set of multilingual pattern libraries for answer extraction and validation. In the spirit of the ongoing attempts to develop freely available resources for QA, we argue that the distribution and use of pattern libraries will contribute to make Multilingual Q...
متن کاملReflections on TREC QA
The TREC (later TAC) Question Answering track reinvigorated the question answering research community, fostering extensive research on different question types and finding answers in different kinds of corpora. Parallel evaluations extended the research further to include a variety of languages and media types. In recent years, the TAC QA track evolved into the Knowledge Base Population (KBP) t...
متن کاملThe Multiple Language Question Answering Track at CLEF 2003
This paper reports on the pilot question answering track that was carried out within the CLEF initiative this year. The track was divided into monolingual and bilingual tasks: monolingual systems were evaluated within the frame of three non-English European languages, Dutch, Italian and Spanish, while in the crosslanguage tasks an English document collection constituted the target corpus for It...
متن کامل